Yeah, that makes sense. The set of books (in a certain coding) actually is the set of integers (in the same code). As you say, the diagonal book in my example has an infinite number of non-null characters rather than an arbitrarily large number of non-null characters followed by infinite null characters. Oops..Manxome wrote:There's no limit to the magnitude of the natural numbers, either; in fact, there's a pretty trivial bijection between the set of natural numbers and the set of all strings made from an alphabet with only one symbol. (Not that it's that hard with multiple symbols: a, b, aa, ab, ba, bb, aaa, ...)CatharzGodfoot wrote:Because there is no limit to the length of a "string" or a "book", they can be viewed as infinite sequences consisting of their total contents followed by an infinite number of null characters. Then the first case is really the second case, and there is an unenumerable quantity of books in the library (provable by diagonalization).
The diagonalization argument requires that the "diagonal" string be allowed to have an infinite number of non-null characters. If every individual string has finite length (before your decision to append infinite nulls), then the books are still countable, because your "diagonal" book isn't a member of the defined set, and so the fact that it's not on the list is no difficulty.
There's no reason you couldn't define the problem to include books of infinite length, but there clearly exist infinite but countable sets of strings, so there's no reason that you must define the problem that way, either.
Does that all make sense?
Library of Thought Experimenting...ness
Moderator: Moderators
- CatharzGodfoot
- King
- Posts: 5668
- Joined: Fri Mar 07, 2008 7:54 pm
- Location: North Carolina
OK, has this thread turned into an infinity discussion? Because this could be a lengthy topic indeed. Infinity is the first order of absurdity and a lot of rational assumptions cease to apply when confronted with the simple logic of various levels of infinity.
Infinity cannot grow by addition. Infinity + Infinity = Infinity. My favorite infinity argument (which I think I got from Asimov) is the infamous infinity hotel. Assume the hotel has an infinite number of rooms all given room numbers. Assume the hotel is completely booked. Now assume an infinite bus comes along with an infinite number of guests. Can we put them into the infinity hotel? YES. Because for each room there must exist a room with one higher room number. For each person wanting a room the hotel must make an announcement forcing everyone to move to the next highest room number. This frees up room 1 for occupancy. This can be repeated infinitely, but in doing so your guests will be infinitely irritated.
This doesn't cover the set of rational and irrational numbers which cannot be contained in the set of rational numbers. Thus you can put an infinite number of infinities of any order in the infinite set of the next higher order. In other words you can may any infinite sequence as a single unique irrational number within the universe of Aleph 1 of all irrational and rational numbers.
Infinity cannot grow by addition. Infinity + Infinity = Infinity. My favorite infinity argument (which I think I got from Asimov) is the infamous infinity hotel. Assume the hotel has an infinite number of rooms all given room numbers. Assume the hotel is completely booked. Now assume an infinite bus comes along with an infinite number of guests. Can we put them into the infinity hotel? YES. Because for each room there must exist a room with one higher room number. For each person wanting a room the hotel must make an announcement forcing everyone to move to the next highest room number. This frees up room 1 for occupancy. This can be repeated infinitely, but in doing so your guests will be infinitely irritated.
This doesn't cover the set of rational and irrational numbers which cannot be contained in the set of rational numbers. Thus you can put an infinite number of infinities of any order in the infinite set of the next higher order. In other words you can may any infinite sequence as a single unique irrational number within the universe of Aleph 1 of all irrational and rational numbers.
Not that big a math guy, but there are an infinite number of fractions, or values, between 1 and 2, or any two given values, right? So whenever you talk about infinity aren't you really talking about infinity to the infinitieth power? Not that there's a difference as far as I know.
P.S. I still don't quite understand what the original topic was. If someone can explain the point to me that'd be great. Unless the whole point was actually just to make random speculations about a theoretically infinite library containing every book that ever has or ever will be written, in which case I already got it(and don't particularly want it).
P.S. I still don't quite understand what the original topic was. If someone can explain the point to me that'd be great. Unless the whole point was actually just to make random speculations about a theoretically infinite library containing every book that ever has or ever will be written, in which case I already got it(and don't particularly want it).
I think the point was exactly that. And you can talk about infinity with even bigger infinities still existing - the first difference would be between what you could count up to some arbitrary point (e.g. integers) and what you can't (e.g. fractions).Caliborn wrote:Not that big a math guy, but there are an infinite number of fractions, or values, between 1 and 2, or any two given values, right? So whenever you talk about infinity aren't you really talking about infinity to the infinitieth power? Not that there's a difference as far as I know.
P.S. I still don't quite understand what the original topic was. If someone can explain the point to me that'd be great. Unless the whole point was actually just to make random speculations about a theoretically infinite library containing every book that ever has or ever will be written, in which case I already got it(and don't particularly want it).
Hans Freyer, s.b.u.h. wrote:A manly, a bold tone prevails in history. He who has the grip has the booty.
Huston Smith wrote:Life gives us no view of the whole. We see only snatches here and there, (...)
brotherfrancis75 wrote:Perhaps you imagine that Ayn Rand is our friend? And the Mont Pelerin Society? No, those are but the more subtle versions of the Bolshevik Communist Revolution you imagine you reject. (...) FOX NEWS IS ALSO COMMUNIST!
LDSChristian wrote:True. I do wonder which is worse: killing so many people like Hitler did or denying Christ 3 times like Peter did.
Is that actually true? I mean, if you have a hotel with infinite rooms, and it is completely booked, that means every single room has the status "occupied" and there is no room with the status "vacant". If folks show up, the hotel has no place to put them, because for every room, the fact that it is occupied means that it is not vacant, and it applies to every room.tzor wrote:Infinity cannot grow by addition. Infinity + Infinity = Infinity. My favorite infinity argument (which I think I got from Asimov) is the infamous infinity hotel. Assume the hotel has an infinite number of rooms all given room numbers. Assume the hotel is completely booked. Now assume an infinite bus comes along with an infinite number of guests. Can we put them into the infinity hotel? YES. Because for each room there must exist a room with one higher room number. For each person wanting a room the hotel must make an announcement forcing everyone to move to the next highest room number. This frees up room 1 for occupancy. This can be repeated infinitely, but in doing so your guests will be infinitely irritated.
There are different infinities that are not the same size as each other.Caliborn wrote:Not that big a math guy, but there are an infinite number of fractions, or values, between 1 and 2, or any two given values, right? So whenever you talk about infinity aren't you really talking about infinity to the infinitieth power? Not that there's a difference as far as I know.
The size of the set of all natural numbers (0, 1, 2, 3, 4...) is the first infinity; it's usually called "countably infinite" or aleph-null. The size of the set of all real numbers (which includes all those values between 1 and 2) is a bigger infinity. Interestingly, the rational numbers (which include all terminating or repeating decimals, and all numbers that can be expressed as a fraction of two integers) are countable (like the naturals).
There's a big conjecture called the continuum hypothesis that says that the size of the reals is the second infinity (that is, that there's no cardinality that is strictly greater than that of the naturals and smaller than that of the reals). This turns out to be independent of the axioms most commonly taken as the foundation of mathematics (ZFC), and as such cannot be proven or disproven based on accepted math.
You probably have to state the problem more rigorously if you really want to argue about its correctness. But the take-home lesson is that infinite sets have proper subsets that are the same size as themselves. For example, the cardinality of all the integers and the cardinality of all the even integers are exactly the same, even though the first set contains everything in the second set plus an infinite amount more.Gelare wrote:Is that actually true? I mean, if you have a hotel with infinite rooms, and it is completely booked, that means every single room has the status "occupied" and there is no room with the status "vacant". If folks show up, the hotel has no place to put them, because for every room, the fact that it is occupied means that it is not vacant, and it applies to every room.
(So instead of telling each guest to move to the next higher room number, you should really be telling them all to go to the room whose number is twice their current room number, and then putting all the new people in the odd rooms. Fewer steps that way.)
Actually, the amount (cardinality) of fractions is the same as the cardinality of integers. The real numbers have a higher cardinal number, however.Bigode wrote:I think the point was exactly that. And you can talk about infinity with even bigger infinities still existing - the first difference would be between what you could count up to some arbitrary point (e.g. integers) and what you can't (e.g. fractions).Caliborn wrote:Not that big a math guy, but there are an infinite number of fractions, or values, between 1 and 2, or any two given values, right? So whenever you talk about infinity aren't you really talking about infinity to the infinitieth power? Not that there's a difference as far as I know.
P.S. I still don't quite understand what the original topic was. If someone can explain the point to me that'd be great. Unless the whole point was actually just to make random speculations about a theoretically infinite library containing every book that ever has or ever will be written, in which case I already got it(and don't particularly want it).
If we simply talk about the set of rational numbers we can represent all numbers as a matrix of X/Y with X being the denominator and Y being the numerator. In that case the set of rational numbers can be traversed by a diagonal travel like thisCaliborn wrote:Not that big a math guy, but there are an infinite number of fractions, or values, between 1 and 2, or any two given values, right? So whenever you talk about infinity aren't you really talking about infinity to the infinitieth power? Not that there's a difference as far as I know.
Code: Select all
XX 01 02 03 04 05 06 07 08 09
01 1 2 4 7 11 16 22
02 3 5 8 12 17 ..
03 6 9 13 18
04 10 14 19
05 15 20
06 21
07
08
09Deep down I have this gut feeling that the problem with the known universe is that it is not "real" but "rational" that is only an aleph-null order of infinity. This might explain quantum mechanics. Likewise I think that the actual space time universe is real (the set of rational and irrational numbers) but there is a whole portion of space time we cannot observe (because it does not fall into the rational part of the universe) but influences the space time curve. (This might explain "dark mater;" we can't see it because it's not in our phase the space time.)
- CatharzGodfoot
- King
- Posts: 5668
- Joined: Fri Mar 07, 2008 7:54 pm
- Location: North Carolina
It's true for the following reason: Say you have two infinite hotels. One has all of its rooms booked, the other has none of its rooms booked. When you combine the hotels, you have twice as many rooms, right?Gelare wrote: Is that actually true? I mean, if you have a hotel with infinite rooms, and it is completely booked, that means every single room has the status "occupied" and there is no room with the status "vacant". If folks show up, the hotel has no place to put them, because for every room, the fact that it is occupied means that it is not vacant, and it applies to every room.
No, because you can index the rooms in both hotels using the rooms in only one hotel. Use room 1 of the full hotel to index itself, room 2 of the full hotel to index room 1 of the empty hotel, room 3 of the full hotel to index room 2, and so forth.
This is exactly analogous to using the set of counting numbers (0, 1, 2, 3, ...) to index the integers (..., -2, -1, 0, 1, ...).
About the infinity hotel,
Let's say you have two infinite lists of entries, one of which has all the entries 0, and one of which has all the entries 1. The maximum value of any entry is 1. You now need to add 1 someplace. You can't stick it in the list with the infinite 1's, because it's already full, because every entry is a 1. So you have to add it in the list of 0's. Does that make sense, or is it totally wrong?
Possibly I've reformulated the original problem slightly in my interpretation, which might have changed the answer.
Let's say you have two infinite lists of entries, one of which has all the entries 0, and one of which has all the entries 1. The maximum value of any entry is 1. You now need to add 1 someplace. You can't stick it in the list with the infinite 1's, because it's already full, because every entry is a 1. So you have to add it in the list of 0's. Does that make sense, or is it totally wrong?
Possibly I've reformulated the original problem slightly in my interpretation, which might have changed the answer.
OK, let me see if I can formulate it both ways.
Let's say that you start putting occupied rooms in a set: {occupied room #1, occupied room #2, occupied room #3, ...}. You put an infinite number of occupied rooms in that set, and you never add an unoccupied room to the set. If you now ask "how many empty rooms do you have in the set?" the answer is zero.
But let's say that you have an infinite set of rooms and another infinite set of occupants. Now let's say that you want to create an index that lets you look up who is staying in a room, or what room a given person is staying in (that is, you construct a function from one set to the other). And let's say that after you create this index, every person is assigned to exactly one room, and every room has exactly one occupant (that is, the function is one-to-one).
Now you're given a second set of people who want to stay in your rooms, and this set is also infinite (and the same order of infinity as the other sets). You can't put them anywhere without changing room assignments or doubling people up, because your previous index already assigns one person to every room. However, if you throw away your previous index, you can create a new index that maps between the union of the two sets of people and the set of rooms, and this new index can also be one-to-one (every person gets exactly one room, each room has exactly one occupant), even though you've added a bunch of people and haven't added any more rooms.
First, I'm going to do the proof for the reals, because that's easier than the irrationals. But since the rationals are countable, and the reals only contain the rationals and the irrationals, if the reals are uncountable, then the irrationals must be uncountable, too.
So, let's assume by way of contradiction that the reals are countable. That means we can construct some sort of list of all the reals. We don't know what order they would be in, but just as an example, it might look something like this:
#1: 123487.128394608971238496097832.....
#2: 4
#3: 3.1234878990097
#4: 0.0000000000000002
If we can show that there is a real number that is not on this list, without assuming anything about the list, then this isn't really a list of the real numbers, and so no such list can exist.
To keep everything nice and even, let's only look at the digits after the decimal point, and let's just imagine an infinite number of zeroes tacked onto the end of any decimal that terminates:
#1: ~.128394608971238496097832...
#2: ~.0000000000...
#3: ~.12348789900970000000...
#4: ~.0000000000000002000000...
To construct a number not on the list, let's build it one digit at a time, as follows:
The Nth digit of our number is equal to one plus the Nth digit of the Nth number on the list, unless that digit is a 9, in which case we use 0 instead.
So our number for the example above begins 0.2141...
#1: ~.128394608971238496097832...
#2: ~.0000000000...
#3: ~.12348789900970000000...
#4: ~.0000000000000002000000...
So where in the list is our new number? It can't be equal to the first number in the list, because they differ at the first digit after the decimal point. It can't be equal to the second number in the list, because they differ at the second digit after the decimal point. It can't be equal to the third number etc., etc.
Since we've just constructed a new number that isn't on the list, without knowing anything at all about the list, no complete list can exist.
(Technically, you also have to guard against the 0.999.... = 1 special case, but that's not hard.)
Let's say that you start putting occupied rooms in a set: {occupied room #1, occupied room #2, occupied room #3, ...}. You put an infinite number of occupied rooms in that set, and you never add an unoccupied room to the set. If you now ask "how many empty rooms do you have in the set?" the answer is zero.
But let's say that you have an infinite set of rooms and another infinite set of occupants. Now let's say that you want to create an index that lets you look up who is staying in a room, or what room a given person is staying in (that is, you construct a function from one set to the other). And let's say that after you create this index, every person is assigned to exactly one room, and every room has exactly one occupant (that is, the function is one-to-one).
Now you're given a second set of people who want to stay in your rooms, and this set is also infinite (and the same order of infinity as the other sets). You can't put them anywhere without changing room assignments or doubling people up, because your previous index already assigns one person to every room. However, if you throw away your previous index, you can create a new index that maps between the union of the two sets of people and the set of rooms, and this new index can also be one-to-one (every person gets exactly one room, each room has exactly one occupant), even though you've added a bunch of people and haven't added any more rooms.
Let me see if I can help you out.tzor wrote:I've seen a hand waving argument for how you can't do that for the set of irrational numbers but it's really hard to get.
First, I'm going to do the proof for the reals, because that's easier than the irrationals. But since the rationals are countable, and the reals only contain the rationals and the irrationals, if the reals are uncountable, then the irrationals must be uncountable, too.
So, let's assume by way of contradiction that the reals are countable. That means we can construct some sort of list of all the reals. We don't know what order they would be in, but just as an example, it might look something like this:
#1: 123487.128394608971238496097832.....
#2: 4
#3: 3.1234878990097
#4: 0.0000000000000002
If we can show that there is a real number that is not on this list, without assuming anything about the list, then this isn't really a list of the real numbers, and so no such list can exist.
To keep everything nice and even, let's only look at the digits after the decimal point, and let's just imagine an infinite number of zeroes tacked onto the end of any decimal that terminates:
#1: ~.128394608971238496097832...
#2: ~.0000000000...
#3: ~.12348789900970000000...
#4: ~.0000000000000002000000...
To construct a number not on the list, let's build it one digit at a time, as follows:
The Nth digit of our number is equal to one plus the Nth digit of the Nth number on the list, unless that digit is a 9, in which case we use 0 instead.
So our number for the example above begins 0.2141...
#1: ~.128394608971238496097832...
#2: ~.0000000000...
#3: ~.12348789900970000000...
#4: ~.0000000000000002000000...
So where in the list is our new number? It can't be equal to the first number in the list, because they differ at the first digit after the decimal point. It can't be equal to the second number in the list, because they differ at the second digit after the decimal point. It can't be equal to the third number etc., etc.
Since we've just constructed a new number that isn't on the list, without knowing anything at all about the list, no complete list can exist.
(Technically, you also have to guard against the 0.999.... = 1 special case, but that's not hard.)
- Josh_Kablack
- King
- Posts: 5317
- Joined: Fri Mar 07, 2008 7:54 pm
- Location: Online. duh
Ever hear of Lossy Compression?Even if you allowed books to be infinitely long, there still doesn't exist a list of them all, because then the set of books would be uncountably infinite. It
As much as the masturbation about Cantor's transfinite numbers is both correct and yet pointless (since Borges posits a technically finite library) this assertion is still wrong. It assumes that the relationship between list entry and book title is one to one, or at least finite to finite. But that's not part of the problem statement. It's quite possible than the relationship is one to many, one to infinite, zero to many or even zero to infinity.
The assertion that the existence of an index is impossible because there are an uncountable infinite number of books is exactly the same as asserting that the existence of maps is impossible because any space contains an uncountably infinite number of points. And yet maps do exist, and most of them are quite useful - even if none of them provide a truly complete representation of the areas they represent.
Much the way a map to my house includes a carefully ordered list of exact street names and distances, but omits infinite amounts of both relevant information (elevation, longitude, latitude, distance from the equator, the color of the cars parked outside, conversion from cubits to cubic meters) and irrelevant information (current temperature,, price of tea in China, name of that kid killed in Gaza) - an index could very well include only a *partial* enumeration supplemented by rules about uncountable sets. Some examples could be "all books more than three hexagons away are full of lies" "two-thirds of all books with red covers are cryptograms", "any book that starts with the letter I is factual, unless it ends with N", "books not in this index are not worth your concern" and so on and so forth. Of course, it would be very easy for any group of such rules to contradict themselves in certain cases.
To summarize: all that Cantor's Transfinite theory tells us is that "The Wholly Complete and Accurate Listing of All Book Titles and their Locations in the Entire Library" cannot be a non-fiction work in the library. However, "A Very Useful Index to Many Books In Nearby Hexagons and How to Identify Other Useful Guides in Further Hexagons" totally can.
Last edited by Josh_Kablack on Fri Jan 16, 2009 11:00 pm, edited 1 time in total.
"But transportation issues are social-justice issues. The toll of bad transit policies and worse infrastructure—trains and buses that don’t run well and badly serve low-income neighborhoods, vehicular traffic that pollutes the environment and endangers the lives of cyclists and pedestrians—is borne disproportionately by black and brown communities."
The string "xyzzy" is the compression of any information you want under some compression system--it can even be lossless. But that is not useful or interesting in any meaningful fashion if the information can't be decompressed.
What you described is certainly a useful book, but it is not a guide to all books any more than the map in your analogy is a map of all points in space; it is a guide to a relatively small number of interesting books/points. Whether that exists may be an interesting question, but it is a different question than whether a log of all books exists. Admittedly, the original question was rather vague, but that is hardly the answerer's fault.
And the answer to whether "A Very Useful Index to Many Books In Nearby Hexagons and How to Identify Other Useful Guides in Further Hexagons" exists is a definite: maybe. Many volumes that claim to be such a guide will of course exist, but whether or not any of them are accurate will depend on the organization of the library, which is not specified. For example, there will be a book that says "the next book to the left is entirely true." Whether this book is itself true or not will depend on what it is shelved next to. It is entirely possible, in either a finite or infinite library, that none of these relative guides will happen to be shelved in locations that make them true. Similarly, absolute but categorical statements like "all red-covered books are true" could be either true or false depending on how cover colors are assigned, which is also not specified.
So in a sense, that's an even less interesting question, because we can't even answer it one way or the other.
What you described is certainly a useful book, but it is not a guide to all books any more than the map in your analogy is a map of all points in space; it is a guide to a relatively small number of interesting books/points. Whether that exists may be an interesting question, but it is a different question than whether a log of all books exists. Admittedly, the original question was rather vague, but that is hardly the answerer's fault.
And the answer to whether "A Very Useful Index to Many Books In Nearby Hexagons and How to Identify Other Useful Guides in Further Hexagons" exists is a definite: maybe. Many volumes that claim to be such a guide will of course exist, but whether or not any of them are accurate will depend on the organization of the library, which is not specified. For example, there will be a book that says "the next book to the left is entirely true." Whether this book is itself true or not will depend on what it is shelved next to. It is entirely possible, in either a finite or infinite library, that none of these relative guides will happen to be shelved in locations that make them true. Similarly, absolute but categorical statements like "all red-covered books are true" could be either true or false depending on how cover colors are assigned, which is also not specified.
So in a sense, that's an even less interesting question, because we can't even answer it one way or the other.
-
Draco_Argentum
- Duke
- Posts: 2434
- Joined: Fri Mar 07, 2008 7:54 pm
Nah, it totally works, it's just not usually useful. Consider the following compression scheme:Draco_Argentum wrote:Tangent: Surely there is a limit to the maximum compression ratio achievable by lossless compression. I don't know much of anything about information theory but being able to convert absolutely anything into xyzzy sounds like a load.
1. If the information you are compressing is the Encyclopedia Britanica, output "0".
2. Otherwise, output "1", followed by the original information.
This "compresses" the entire Encyclopedia Britanica to a single bit. Of course, it makes any other information you want to "compress" longer, and you need to already have the entire Encyclopedia Britanica in order to decompress it, but a decompression algorithm exists that will always get back exactly the original input.
And if that sounds silly, remember that the maximum average lossless compression ratio for arbitrary information is actually one--that is, no compression at all. "Compression" is actually always a way to say a few things efficiently at the cost of making everything else longer (or impossible) to say. If there's a relatively small number of messages that you send very often, and a bunch of stuff that you almost never say, that's actually a very good deal--but there's no "free lunch."
Last edited by Manxome on Sat Jan 17, 2009 1:43 am, edited 1 time in total.
- Josh_Kablack
- King
- Posts: 5317
- Joined: Fri Mar 07, 2008 7:54 pm
- Location: Online. duh
Yes.Manxome wrote: remember that the maximum average lossless compression ratio for arbitrary information is actually one--that is, no compression at all.
Not necessarily. Not even for lossless."Compression" is actually always a way to say a few things efficiently at the cost of making everything else longer (or impossible) to say.
If you assume that current somewhat efficient compression and decompression algorithms are known to both the sender and the receiver before the message is sent, that maximum average is your *worst* case. LZW compression may not actually save you anything, but it's guaranteed to not be any *longer* than the original message. (unless you want to split hairs and say that the prior knowledge of the LZW algorithm lengthens the message itself.)
Last edited by Josh_Kablack on Sat Jan 17, 2009 6:01 am, edited 2 times in total.
"But transportation issues are social-justice issues. The toll of bad transit policies and worse infrastructure—trains and buses that don’t run well and badly serve low-income neighborhoods, vehicular traffic that pollutes the environment and endangers the lives of cyclists and pedestrians—is borne disproportionately by black and brown communities."
- Josh_Kablack
- King
- Posts: 5317
- Joined: Fri Mar 07, 2008 7:54 pm
- Location: Online. duh
I disagree on that, - the inability to answer many questions definitively is precisely what makes the idea of the library interesting.Manxome wrote:So in a sense, that's an even less interesting question, because we can't even answer it one way or the other.
"But transportation issues are social-justice issues. The toll of bad transit policies and worse infrastructure—trains and buses that don’t run well and badly serve low-income neighborhoods, vehicular traffic that pollutes the environment and endangers the lives of cyclists and pedestrians—is borne disproportionately by black and brown communities."
What about the "this part is not compressed" tag, doesn't that have length?Josh_Kablack wrote:If you assume that current somewhat efficient compression and decompression algorithms are known to both the sender and the receiver before the message is sent, that maximum average is your *worst* case. LZW compression may not actually save you anything, but it's guaranteed to not be any *longer* than the original message. (unless you want to split hairs and say that the prior knowledge of the LZW algorithm lengthens the message itself.)
"No, you can't burn the inn down. It's made of solid fire."
Sorry, but I'm highly confident that you're wrong. Unless the wikipedia article is incorrect, or I've completely misunderstood it, it is entirely possible for LZW compression to make a message longer than it originally was. For example, if you use it to compress the alphabet (each possible character appears exactly once), that will result in a "compressed" file that is actually longer than the "uncompressed" version. The "codes" that LZW outputs need to gradually grow longer as the compression table grows in size, and that means that if there aren't actually any repeated substrings, you're wasting space. Notice how the example compression on wikipedia shifts from outputting 5 bits at a time to 6 bits at a time part way through?Josh_Kablack wrote:Not necessarily. Not even for lossless.
If you assume that current somewhat efficient compression and decompression algorithms are known to both the sender and the receiver before the message is sent, that maximum average is your *worst* case. LZW compression may not actually save you anything, but it's guaranteed to not be any *longer* than the original message. (unless you want to split hairs and say that the prior knowledge of the LZW algorithm lengthens the message itself.)
If the "compressed" version is longer, you can fall back on sending the "uncompressed" version instead, but then you need some sort of label indicating which mesages are compressed and which aren't, so you lose space there instead.
An average compression ratio of 1 is the best possible for any algorithm if you have to cope with arbitrary input; that's a very simple information-theoretic result. Natural language tends to compress very well because it has lots of redundancy--that is, most strings of English characters have no actual English meaning. If you can make "hello" shorter at the cost of making "zjxcb" longer, that's a win for most email. But if every possible combination of input is meaningful, then you can't make some of them shorter without making others longer or losing some expressivity. The total number of distinct messages you can send is limited by your alphabet size and message length. That's simple combinatorics.
Last edited by Manxome on Sat Jan 17, 2009 7:57 am, edited 1 time in total.
OK, let me propose an even more "interesting" scenario: you are in a room with arbitrary properties. I do not specify what any of those properties are. Now you can't answer any question about the room definitively! Isn't that much more interesting?Josh_Kablack wrote:I disagree on that, - the inability to answer many questions definitively is precisely what makes the idea of the library interesting.
Couple of questions on this topic.'
1) Where can I get a start on learning on Information theory? books, websites, etc.?
2.) Manxome, What is the compression rate of the alphabet compared to the uncompressed string?
I ask this because you said that the minimum compression is 1. So are you stating that the uncompressed string has a rate of 1 and the alphabet has a rate of 1? So out of string "XYBVE", "B" could be made as long or longer than "XYBVE"?
1) Where can I get a start on learning on Information theory? books, websites, etc.?
2.) Manxome, What is the compression rate of the alphabet compared to the uncompressed string?
I ask this because you said that the minimum compression is 1. So are you stating that the uncompressed string has a rate of 1 and the alphabet has a rate of 1? So out of string "XYBVE", "B" could be made as long or longer than "XYBVE"?
Ancient History wrote:We were working on Street Magic, and Frank asked me if a houngan had run over my dog.
A ratio of 1 is the best average compression you can get across all possible messages; "good" compression algorithms exploit the fact that we say some things much more often than others and make the common messages short at the cost of making the uncommon messages long.A_Cynic wrote:2.) Manxome, What is the compression rate of the alphabet compared to the uncompressed string?
I ask this because you said that the minimum compression is 1. So are you stating that the uncompressed string has a rate of 1 and the alphabet has a rate of 1? So out of string "XYBVE", "B" could be made as long or longer than "XYBVE"?
If you used LZW to compress a message consisting of the entire ASCII alphabet (all 2^8 possible ASCII characters, including the unprintable ones), depending on the parameters you used, you'd likely end up with a "compressed" message that was around 9 * (2^8) bits long, whereas the "uncompressed" message would be 8 * (2^8) bits long--so the "compressed" version is about 12.5% longer (worse examples could probably be found). Other messages, such as "KHAAAAAAAAAAAAAAAAAAAAN!", would end up being shorter after you compressed them, because they have lots of repetition (and LZW is designed to make repetitive messages shorter). But if you took every possible thing you could ever say (up to some maximum length) and "compressed" all of them, the lengths would stay the same (or possibly grow longer) on average, even though some individual messages would become shorter.
- CatharzGodfoot
- King
- Posts: 5668
- Joined: Fri Mar 07, 2008 7:54 pm
- Location: North Carolina
As long as there is enough repetition within the message, a Huffman tree encoding will generate a smaller output regardless of the number of possibly meaningful substrings.Manxome wrote: An average compression ratio of 1 is the best possible for any algorithm if you have to cope with arbitrary input; that's a very simple information-theoretic result. Natural language tends to compress very well because it has lots of redundancy--that is, most strings of English characters have no actual English meaning. If you can make "hello" shorter at the cost of making "zjxcb" longer, that's a win for most email. But if every possible combination of input is meaningful, then you can't make some of them shorter without making others longer or losing some expressivity. The total number of distinct messages you can send is limited by your alphabet size and message length. That's simple combinatorics.
It's true that encoding the alphabet with a Huffman tree lands you a longer message than you started with, but for messages significantly longer than the size of the alphabet you'll never end up with anything more than a constant amount larger than the original.
Is there a proof for that? While it is possible that the asymptotic behavior never adds more than a constant amount to the length of the message, it's not at all obvious to me that this is the case for this algorithm.CatharzGodfoot wrote:It's true that encoding the alphabet with a Huffman tree lands you a longer message than you started with, but for messages significantly longer than the size of the alphabet you'll never end up with anything more than a constant amount larger than the original.
For example, a string of the form "ABACADAEAF....AYAZBCBDBE..." is substantially longer than the alphabet, but still compresses extremely poorly (no repeated bigrams). I'm not 100% certain that you could find a similar abusive pattern for arbitrarily long strings, but I don't see any obvious reason that you couldn't.
And even if this claim is correct, it doesn't change the fact that the average compression ratio across all possible messages under a fixed length can't be better than 1. It will just mean that for every string that attains a linear (rather than constant) compression, there will be a linear number of strings that get a constant amount worse, to maintain the average.
Um...yes, the algorithm compresses repetitive messages, so as long as the message is repetitive, it will get shorter. If you intended some more insightful point than that, I'm afraid it has eluded me.CatharzGodfoot wrote:As long as there is enough repetition within the message, a Huffman tree encoding will generate a smaller output regardless of the number of possibly meaningful substrings.
